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Abstract 

We provide another framework of iterative algorithms based on thresholding, feedback and 
null space tuning for sparse signal recovery arising in sparse representations and compressed 
sensing. Several thresholding algorithms with various feedbacks are derived, which are seen as 
. . . exceedingly effective and fast. Convergence results are also provided. The core algorithm is 

! shown to converge in finite many steps under a (preconditioned) restricted isometry condition. 

I The algorithms are seen as particularly effective for large scale problems. Numerical studies 

' about the effectiveness and the speed of the algorithms are also presented. 

^ I Keywords: 

' sparse representation, compressed sensing, null space tuning, hard thresholding, feedback, 

restricted isometry principle 

1. Introduction 

A basic underdetermined linear inverse problem is about the solution to the system of linear 



c/3 

O ■ equations 



^ I where A £ R"^^ (n <^ N) and b G M" are known. In the past few years, sparsity constraint 

CD ' has been a popular regularization approach toward the solution of such inverse problems. The 

, problems of sparse representation and compressed sensing are typical examples. 

I The goal of sparse representation is to approximate a signal 6 by a linear combination of 

the least number of elementary signals drawn from a dictionary A. Equivalently, we are to find 
. the sparsest coefficient x such that Ax = b. In compressed sensing, signals are assumed to be 

(N ■ 

sparse in some transform domain. The purpose is to recover the coefficient x (and the signal) 
from a surprisingly small number of linear measurements Ax = b. Evidently, the central theme 
in compressed sensing is also to find sparse solutions to underdetermined linear systems. 
^ I With the sparsity constraint, the basic problem here involves finding the sparsest solutions 

satisfying the linear equations. In other words, one wishes to solve an ^o-™iiii™ization problem 



Ax = b, (1) 



(Pq) min ||x||o s.t. Ax = b, 

where || • ||o is a quasi-norm standing for the number of the nonzero entries. 

(Pq) is clearly combinatorial in nature. It is NP-hard in general [? ]. The renowned advances 
in this area lie fundamentally in the replacement of (Pq) with a convex relaxation 

(Pi) min ||x||i s.t. Ax = b. 
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See also a series of articles dealing with the equivalence between (Pq) and (Pi), e.g., [? ? ? ? 
????]. Evidently, (-Pi) can be solved accurately by interior-point methods [? ? ] and a 
number of other different methods. 

Among others, "greedy algorithms" are another class of popular means of finding sparse 
solutions. Two typical representative approaches are Matching Pursuit (MP) and Orthogonal 
Matching Pursuit (OMP), e.g., [???]. In addition, a number of variants of the greedy pursuit 
algorithms have also been proposed by various authors, e.g., stagewise orthogonal matching 
pursuit (StOMP) [? ], compressive sampling matching pursuit (CoSaMP) [? ] and subspace 
pursuit (SP) [? ], etc. 

A third class of algorithms for sparse solutions to underdetermined linear inverse problems 
are iterative thresholding/shrinkage algorithms, which are known for their simplicity and the 
ease in implementations. Most iterative thresholding/shrinkage algorithms are motivated by 
minimizing a cost function, which combines a quadratic error term with a sparsity-promoting 
regularization term, i.e.. 



Various iterative hard/soft thresholding algorithms [??????], gradient-descent methods 
[???], and Bregman iterations [? ? ] are representatives. Among this class of works, there 
is one proposed by Blumensath and Davies [? ], in which the ^i-regularization term is replaced 
by A||x||o. An iterative hard thresholding (IHT) algorithm within the majorization minorization 
(MM) framework is analyzed [? ]. It was also shown that IHT converges to a local minimum of 
the ^o-regularized cost function under some conditions. 

In [? ], Donoho and Maliki combines exact solution to small linear system with thresholding 
before and after the solution to derive a more sophisticated scheme, named two-stage threshold- 
ing (TST). Very recently, Foucart has proposed a hard thresholding pursuit (HTP) algorithm 
[? ]. In essence, HTP can be regarded as a hybrid of IHT and CoSaMP. 

In this article, a class of algorithms combining thresholding, feedbacks and null space tun- 
ing is proposed to find sparse solutions. The proposed algorithms are brought into a concise 
framework of null space tuning (NST). It turns out that the mechanism of NST improves the 
performance of the algorithms significantly. Several sparsity enhancing operators are incorpo- 
rated into the NST to develop various algorithms. As examples, several specific algorithms are 
provided which are rather effective in terms of the "recoverability" to a larger number of non-zero 
components/coefficients. These algorithms are shown to be exceedingly fast. Some results about 
the theoretical performance are also presented. Among this class, two representative algorithms 
are shown to converge under commonly known conditions. 

The organization of this article is as follows. A brief description of the common framework of 
null space tuning is given in Section [2j The core algorithm, null space tuning with hard thresh- 
olding and feedback (NST-I-HT-I-FB), is introduced in Section [3l In Section HJ we present two 
other algorithms possessing the feedback nature, along with a brief study of the computational 
issues of the NST based algorithms. Section[5]is dedicated to the theoretical convergence studies 
of the NST-|-HT-|-FB algorithms. We show that the algorithm allows stable recovery of sparse 
vectors if the measurement matrix satisfies commonly known conditions. Extensive numerical 
tests are presented in Section [6] to justify the advantages of the algorithms in practice. 

2. A common framework of the approximation and null space tuning algorithms 

Assume that A has full (row) rank and there exists a desired vector x such that Ax = b. 
We propose the following iterative framework of the approximation and null space tuning (NST) 
algorithms 



min — ||y4x — ftlln -|- Allxlh . 



(NST) 
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Here n{x^) approximates the desired solution by various principles, and, F := I — A*[AA*)^^A 
is the orthogonal projection onto ker^. The feasibility of is assumed, which guarantees that 
the sequence {x'^} are all feasible. Obviously, we hope that ^ x as k increases. 
Due to the feasibility of the sequence {x'^}, the NST step can be rewritten as 

= x'' +[I - A*{AA*)-^A]{u'' -x'') (2) 
= u'' + A*{AA*)-\b- Au''), 

which indicates that x'^'^^ — u^ is perpendicular to the hyperplane {x : Ax = b}. Therefore, x^^^ 
is the orthogonal projection of u'' onto the feasible set. 

In the NST procedure, P (or A* {AA*)~^) can be computed off-line as its appearance does 
not change during iterations. For very large scale problems, it is very useful to build the two 
matrices based on the structure of A. Oftentimes, the fast Fourier transforms, the wavelet 
transforms, etc., can be used to facilitate the construction of P. Using the structure of A, the 
computation may require substantially less memory. 

Evidently, the NST framework places much emphasis on the role of the projection. The choice 
of the approximation operator D is definitely a separate topic itself. Different applications call 
for different operators. As a special case of the NST framework, if B is set as a projection onto a 
convex set (which is not what we suggest here), then the associated NST procedure will clearly 
be one instance of the POCS (projection onto convex sets) method. 

Clearly, with the sparsity constraint, the fundamental of the approximation operator D lies 
in enhancing the sparsity and the feasibility. We would then derive the associated iterative 
algorithms. In this work, the choices of ID are not projections onto convex sets. Convergence 
proofs are also provided, which are anything but trivial since these algorithms are not part of 
the POCS family. 

Specifically, three algorithms will be discussed in this article. These methods have the 
following appearances ordered in their importance relative to this article. For simplicity, we 
denote by the index set corresponding to the most s significant entries of x'', by the 
complement set of Tj. in {1,2,- •• ,iV}, and by At,^ the submatrix consisting of columns of A 
indexed by , respectively. We define the hard thresholding operator which keeps the largest 
s entries (in magnitude) and sets all the others to zeros. 

• NST + hard-thresholding + feedback (NST+HT+FB) 

' 4, = 4, + iAn^n)-'A*^^AT;:X^^^, 

< utc = 0, 

k 

• NST + hard-thresholding + suboptimal feedback (NST+HT+subFB) 

< u^c = 0, 

^ x>'+^ = x''+¥{u'' - x''). 

Here A'^ < I/P^^^tJ^- 

• NST + stretched hard-thresholding (NST+stretched HT) 

( u'' = e'^Tsix''), 

I ^.fc+l — j.k F{u'^ — x^). 

A reasonable choice of 6^ is ||6||i/||j4j'j.x^ 
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3. Null space tuning with hard thresholding and feedback (NST+HT+FB) 

As mentioned earher, B functions fundamentally to approximate the s sparse solution as well 
as possible. For simplicity, is always set as the least squares solution, i.e., x^ = A*{AA*)^^b. 
Throughout of this article, we shall assume that there exists an s sparse vector x such that 
Ax = b. 

In the core part of the algorithms, the approximation operator D is set as thresholding 
plus feedback. We begin by commenting that NST+HT+FB is surprisingly efficient and the 
algorithm converges in finite steps under common assumptions. The convergence is again proven 
in Section [5j 

Since the sequence {rc'^} are always feasible in the framework of the NST algorithms, one 
may split b as 

b = At.Xt, + At'^Xt'^. 

" -^k k -^k 

In most (if not all) thresholding algorithms, thresholding (hard or soft) is taken by merely 
keeping the entries of x^ on T^, and thereby completely abandons the contribution of At^x^c 

to the measurement b. Though x^c gradually diminishes as /c — ?• oo (as shown in Section [5]), it 

is not difficult to observe that the contribution of At'^xI^c to b can be quite significant at initial 

^ k 

iterations. Therefore, simple (hard) thresholding alone can be quite infeasible at earlier stages. 

We propose an approximation operator D that combines the hard thresholding (HT) and 
a feedback (FB) to enhance the feasibility of u''. This approach is termed the NST+HT+FB 
algorithm. 

The main point is to feed the contribution of At<^x%'c to b back to im(^7^ ), the image of 

^ k 

At^. . That is, we require to find a proper rf^ such that 

AT^rf K Atcx^t^^. 

A straightforward way is to set 

= arg min 1 1 ?7 - ^Tf a^T^ 1 1 2 , (3) 

■q k k 

which has the best/least-square solution 

r^^ = {A*t^At,)-^A*t^At^^x^t^^. 
The NST+HT+FB algorithm is then established as follows 



(NST+HT+FB) 



Hirp Xi 

utc = 0, 

k 



Figure [T] depicts the geometric interpretation of the NST based algorithms. Here n = 1, 
N = 2 and s = 1. The red trajectory is a geometric interpretation of the NST+HT+FB 
algorithm. The algorithm is deemed converged when — 6||2 is less than some predetermined 
value, or \\u^ — u^~^\\2 is sufficiently small. The convergence proof is placed in Section [5j 
Algorithm [1] is the pseudo-code of the NST+HT+FB procedure. 

We may also compare the NST+HT+FB with a simpler thresholding-only scheme within 
the NST framework which we discuss immediately later. It is this very feedback adjustment 
that greatly improves the rate of convergence. 

In fact, for cases in R^, NST+HT+FB stops in just 1 iteration. Let us examine the example 
illustrated in Figured! where the underdetermined system is aixi + 02X2 = b and ai > 02 > 0. 
Clearly, the exact solution is x = [0 -^]'^ and x'^ = [ ^^^^'i ^^'^^■i ]'^ ■ The equation a2r]^ = aix^ 

has definitely an exact solution rj^ = ^aiXj*. This feedback in turn gives rise to u° = [0 —]'^, 
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Ax = h\ 




Figure 1: Geometric description of the NST based algorithms. The blue (dashed) and the gray (dash dotted) 
illustrate the trajectories of NST+HT and NST+ST, respectively. The red solid demonstrates the iteration of 
NST+HT+FB. 



Algorithm 1: NST+HT+FB 
Input: A, b, s, Xo, ei, €2; 
Output: n, x; 

= 0; k = 0; x^ = Xq] 
nO = T,(x°); 

while \\Auk - felb/H^lb > ei and \\u'' - n''~^||2/||n''~^||2 > £2 do 
k = k + l; 

return u^, x^; 



which is the exact solution. By the same principle, the readers shall see in Section [5] that the 
NST+HT+FB algorithm generally converges in finite steps. 

As a comparison, let us also understand a bit more about a thresholding-only scheme under 
the NST framework. One natural choice of the approximation operator D is to keep only the 
largest s entries, which gives rise to the simpler NST+HT algorithm 

(NST+HT) I ^,+1 ^^^^ j p^^, _ 

Clearly, u'' can be regarded as a projection onto the highly non-convex "^o-ball" {{x : \\x\\o < s}) 
in some sense. 

We comment that P(ii'^ — x^) is the nearest element in ker Atou^ — x^ in the ^2-iiorm sense, 

i.e., 

P(u'' - x^) = arg min \\{u'^ - x'^) - z\\2. 

Though — x^ cuts out the {N — s) smaller entries of x^ completely, ¥{u^ — x^) provides nec- 
essary corrections in the least squares sense. This can explain partially why NST+HT enhances 
the sparsity gradually. The convergence of NST+HT will be discussed in Section [Sj where we 
show that limfc_^oo = x under commonly known conditions. 

Remark 1. As an example of the NST framework, Candes and Romberg's work [? ] is in fact 
a null space tuning with soft thresholding (NST+ST) in an attempt to provide a solution to (Pi). 
Soft thresholding approximates a projection onto the convex ^i-ball. Therefore, the convergence 
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of their algorithm could fall within the framework POCS. The NST+HT method, however, 
is not a POCS algorithm. As we shall show in Section [Sj the convergence of NST+HT can 
not generally follow the threads of POCS convergence arguments. A preconditioned restricted 
isometry condition is introduced instead. 

The blue (dashed) and the gray (dash dotted) in Figure [1] illustrate the trajectories of 
NST+HT and NST+ST, respectively. Clearly, the hard thresholding operator finds by keep- 
ing the most significant entry of and the soft thresholding operator finds vf^ by shrinking all 
entries of toward zero. Figure [T] explains intuitively why NST+HT possesses greater rate of 
convergence than that of NST+ST in general. Figure [1] also provides an intuitive description 
(in red solid) why the NST+HT+FB tends to converge faster. 

The NST+HT algorithm has seemingly some resemblance to the method of iterative hard 
thresholding (IHT). Immediate discussions about the differences and the connections between 
NST+HT and IHT are the very next concern. 

Differences and connections between NST+HT and IHT 
The iteration of IHT proposed in, e.g., [? ? ] is 

fTHTl / ^'=ir.(x'=), 

^ ' \ x'^+i = u'' + A*{b- Au''). 

NST+HT, on the other hand, has an equivalent form 

( u'' = Tsix''), 

\ x''+^ =u'' + A*{AA*)"^{b-Au''). 

Evidently, in a special case where A is a Parseval frame, i.e., AA* = I, NST+HT reduces to 
IHT. 

However, for general measurement matrices, considerable amount of numerical studies demon- 
strate that NST+HT is far more effective than IHT. More precisely, NST+HT has a much greater 
ability to recover signals consisting of a much larger number of nonzero components than that of 
IHT. These will be illustrated in Section[6l The key rational of such performance differentiations 
lies in the null space tuning step. 

A point of view from error correction ( denosing ) 

We may also peek into the contaminated signal situation, and discover that the role of 
{AA*)~^ is vital at reducing the noise. That explains why NST+HT outperforms IHT in realistic 
situations where signals are contaminated by noises, or when signals are approximately sparse. 
In the following, we assume x = x'^ + v, where is the ideal s sparse signal and v is the Gaussian 
white noise. 

As analyzed in [? ], IHT can be regarded as a majorization minorization (MM) algorithm 
to the problem 

(Po,s) min||^x-6||2 s.t. ||x||o < s. 

Evidently, solving {Po,s) is to find the best s-term approximation to the original signal from the 
given measurements. 

Let us observe, however, that the cost function in (-Po,s) is not quite ideal for finding good 
approximations to contaminated (not exactly sparse) signals. The following is a brief discussion. 

Suppose u is the solution to {Po,s)- The original signal can be written as x = u + e, 
where e is the approximation error (nonzero by the fact that x is not precisely sparse), and the 
measurement can be written as A{u + e) = b. For simplicity, we assume there is no measurement 
error. By the singular value decomposition, A = UTiV* , where U and V are unitary and S is the 
n X N singular matrix having in the main diagonal singular values {cri}^^^. It then follows that 
||^e||2 = ||i7Sy*e||2 = ||Sy*e||2 = ||Se||2, where e := V*e. Let us note that the i^^ component 
of e is therefore the approximation error in the direction identified by the i^^ column of V. 
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Figure 2: Plots of ||u - x" ||2/||s:'' II2 as a function of the noise level e for NST+HT and IHT. Here s = 5, ||a:''||2 = 1 
and ||w||2 = e. NST+HT does have the advantage at recovering contaminated Gaussian sparse vectors over that 
of IHT. 

Recall that u is the solution to {Po,s)- Hence, \\Au — b\\2 = ||^e||2 = ||5]e||2 possesses 
the least .^2-iiorm. That is to say (Po,s) is to minimize ||Se||2. However, minimizing ||Se||2 
would penalize more heavily the error components corresponding to the larger singular values. 
Hence, the solution to (Po,s) would depend on the singular value structure of the measurement 
matrix A, which can be problematic. Because the original signal is completely independent of 
the measurement system A, the approximation error corresponding to the appropriate solution 
should be penalized fairly in all (singular vector) directions. 

We can now explain why NST+HT tends to perform better by the same MM principle. The 
null space tuning step can be written further as 

^k+i = ^fc + A*{AA*y\b - Au'') 

= u'' + A*{AA*y^[{AA*yh - {AA*y^Au''] 

Consequently, with an equivalent measurement equation {AA*)~'2 Ax = {AA*)~'2b and the mea- 
surement matrix {AA*)^2 A , NST+HT may also be regarded as a majorization minorization 
(MM) algorithm for the solution of the following problem 

(P^,J mm \\iAA*)--2{Ax-b)h s.t. \\x\\o < s. 

Similar to the previous analysis, {Pq g) is to minimize || (^A*)~ 2 ^e||2, where e = x — u is the 
approximation error, x is the contaminated signal and u is the solution to (Pq s) well. With 

A = UY1V\ {AA*)~^A = U[I 0]V*, and \\{AA*)-^Ae\\2 = \\U[I 0]V*e\\2 = ||[/0]e||2. Therefore, 
the approximation error is penalized fairly in all directions identified by the associated singular 
vectors of the measurement system. 

This explains why NST+HT functions better in denoising than IHT for general measure- 
ment matrices from the viewpoint of MM procedures. For a quick demonstration. Figure [2] is 
the numerical example showing this very fact about the error correction differentials. In this 
experiment, we just set x = x'^ + v. Here x^ is the original sparse signal with the unit ^2-iiorm 
and V is the Gaussian white noise with the energy e. 

We leave the convergence study to Section [5l and move on to the rest algorithms of the NST 
family. 

4. Other choices of D and computational issues 

In the NST+HT+FB scheme, a matrix inversion (A^^At,^)"^ is required, and T^. changes 
during iterations. The inversion may then need to be computed in every step of the iterations. 
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For large scale problems, such matrix inversion in every step (as in OMP, CoSaMP, SP and 
HTP) can still be computationally intensive. 

We propose here some other choices of the approximation operator B having the feedback 
functionality, but with substantially less computational complexities. 



4-1- NST+HT+suboptimal feedbacks 
Note that the role of {A^ Ar^y'^A^ 



A 



TcXrpc 



is to calculate the feedback of the tail con- 



tribution to b. As long as A^^ At,. is well conditioned, which is a typical requirement by the 
well-known restricted isometry property (RIP) for convergence proofs (Section [5]), {A^^At^)~^ 
can be approximated by X'^I with A'^ being on the order of the spectrum of {A!^^AT^.)~^ ■ Ev- 
idently, a natural approximation of the feedback can be simplified to X^A'!^^At^x!^c. We then 
reach the first suboptimal scheme 



(NST+HT+subFB) 



uh.c = 0, 

X —— X ~\~ 



Here we choose A < 1/ 



1 2 for the evident stability consideration of the algorithm. 



4.2. NST+HT with stretching 

Let us recall that the purpose of the feedback is to enhance the feasibility of . We also 
observe that, to the most s significant components of x^ , the feedback action results in oftentimes 
the magnification of the components. With this observation, we also propose a stretched/scaled 
version of NST+HT 



(NST+stretchedHT) 



= e'"Is{x^), 



X' 



k+l 



Evidently, we hope At^6 xl^ approximates b better than At^x^ . A reasonable choice of 6 is 
||6||i/Pt,4JIi- 

4-3. Sparsity adaptation 

We note that all previous algorithms require the knowledge of the sparsity of the desired 
solution. This seems wishful in most applications. To make the NST based algorithms more 
applicable, we may begin with a conservative estimation sq of the sparsity of the real solution, 
and increase the sparsity by s' gradually. Taking NST+HT as an example, Algorithm[2]describes 
the pseudo-code of the adaptive NST+HT algorithm. 



Algorithm 2: Adaptive NST+HT algorithm 
Input: A, b, sq, s' , si, ei, €2; 
Output: u, x; 

Uo = 0; s = So; 
xo = A*{AA*)-^b; 

) = (NST+HT)(A, 6, s, Xo, ei, £2); 
while \\Aun — fe||2/||^||2 > ei and — Uolb/H^olb > ^2 ctnd s <= si do 

Uq — Un, 

S — S ~\~ S , Xf) — X, 

\Ufi', X ) = (NST+HT)(^, 6, s, Xo, ei, £2); 
return ti„, x; 
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4.4- Computational complexity per iteration 

We now study the computational issues about the NST based algorithms briefly. 

For the null space tuning step, A* {AA*)~^ does not change the appearance during iterations. 
Consequently, if the inversion {AA*)~^ is calculated off-line and stored in the memory, the NST 
step requires n{N + n + s) multiplications. As mentioned earlier, for very large scale problems, 
it is necessary to exploit the structure of the matrices. The computational complexity of the 
null space tuning step may be reduced substantially to O(A^logn). 

The computational complexity of the approximation operator varies greatly for different 
algorithms. For NST+HT+FB, solving the least squares problem ([3]) by Cholesky factorization 
requires ^ns^ + + 0{ns) multiplications [? ]. For NST+HT+subFB, the typical number of 
multiplicative operations of calculating A^^^t^cx^c is nN. Furthermore, if the Gramian matrix 
A* A is stored in the memory, the number is reduced to s{N — s). For NST+stretchedHT, to 
calculate the scalar requires sn multiplications and a division. 

To sum up, when {AA*)~^ is stored in the memory and n is on the same order of N, 
NST+HT+FB has a computational complexity at each iteration ^ns"^ + ^s^ + 0{nN) for general 
matrices. For the other three NST based algorithms, the computational complexity is 0{nN). 

5. Convergence 

In this section, theoretical performances of the NST+HT and the NST+HT+FB algorithms 
are presented. Convergence results are first established when the sequence {T^} are the same. 
This is followed by clarifications of the connection between NST+HT and NST+HT+FB. Before 
continuing, some preparations are in the order. 

Proposition 2 (Weyl's inequality [? ]). Let X,Y £ ([^"xn Hermitian matrices and let the 
eigenvalues \i{X), \i{Y) and \i{X + Y) he arranged in descending orders. Then for each 
/c = 1,2,... ,n, 

\k{x) + A„(y) < \k{x + Y)< \k{x) + Ai(y). 

Lemma 3. Let A £ M."-^^ be a Parseval frame and T C {1, . . . , iV} with \T\ < n. If At^A^^ is 
invertible, then ||^t^^||2 < 1 and \\AtcA'^c\\2 = 1- 

Proof. Because Aj-AI^ and At<:A^c are all non-negative definite, ||^7^^^||2 = \i{A'X'A'^) and 
||y4TcA^c||2 = \i{AT<:A^c),T:esi)ect\\e\y. We then require to prove Ai(ArA^) < 1 and Ai(ylT':^T'=) ~ 
1. 

Since A is a Parseval frame, AA* = AtA^+At<^A^c = I, which means \i{ATA^+AT'^A^c) = 
1, for i = 1, . . . , n. According to Weyl's inequality, 

Xi{AtA^) + \n{ATcA^c) < \i{AtA^ + AtcA'^c) = 1. 

Furthermore, Aj-'^A'^c is invertible, implying A„(AtcA^c) > 0. Then Ai(^r^y) < 1 is followed. 
For the second claim, we derive 

Xn{ATA*j.) + Xi{AtcA^c) > XniAxA^ + AtcA^c) = 1 

and 

Xi{AtcA*tc) + Xn{ATA*j^) < Xi{At'^A^c + AtA^) = 1. 

These two inequalities lead to Xi{AtcA^c) + A„(AtA5^) = 1. Since |T| < n, A„(AT^r) = 0. 
Hence, Xi{At<:A1^c) = 1 as asserted. □ 

Theorem 4. Let A be a Parseval frame. Suppose A'^At and At<:A^c are invertible with T C 
{1,...,A^} and \T\ = s. For NST+HT, if Tj = Tj+i = ... = T for some integer j, then 
lim^^oo = x^. Here x^ has the expression 

x\, = x{. + {A*rpATY^ A*rpATcx{.^ 

and 

x\,, = A*rpc {I - AT{A*j^ATy^AT) Ar^xlpc. 
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Proof. Without loss of generality, we assume T = {1, . . . ,s} and j = 0. We then require to 
prove 

x\, = x^ + {A^At)~^A^Atcx^c 

and 

xj., = A*Tc {I - At{A*tAt)~^A*t) Atcx'^c. 
When A is a Parseval frame and Tq = Ti = . . . = T, the second step of NST+HT gives rise 

to 











" " 


k+1 





















A*j 
A* 



[At Aj 



AtpcAT<^x^ 



Therefore, 



x> 



X. 



fc+1 

T 

fc+1 



+ 



A*j, + A*j,{AtcA*j,,) + ■■■ + A*j,{AtcA*j,, 



A^c^A^c A^c) 



At'^Xj'c 



We now establish the required statements 



lim A*T + A*t{At^A*tc) + • • • + A1r{AT-A1rcf = {A1rAT)~'Alr 



(4) 



and 



lim (^t-^tO =I- AT{A*rrAT)-'A^. 

k—^oo 



Since AtA^ and A^Aj- are all non-negative definite, H^T^rlb = Ai(AtA^) = Xi{A^At) = 
\\A'^At\\2- Lemma [3] implies, when At^A^^c is invertible, ||j47^j4^||2 < 1. Together with the 
invertibility of A^At, we then derive < / — A'^At < I. 

Furthermore, since Aj-AI^ + At^A^^c = AA* = I, for each integer A; > 0, 



A*t.{At^A*tc)^ =A*t{I - AtA*t) 

k 



1=0 



-AtA*^)' 



{-A*TATyA*T 

1=0 ^ ^ 



=(/ - A*tAt)''A*t. 



Consequently, utilizing the Neumann series. 



lim A^ + A^{AtcA^c) H h A^{AtcA^. 

fc— s>oo 



* \fc 



: lim (A^ + {I - A^At)A*t + ••• + (/- A^ATfA*^ 

fc— s>oo 



/=0 

-1 A* 



\ fc— >oo 

={A*TATy'A*T 
The first claim is then followed 



lim y^{I-A*TATy A*T 
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k-1 



Similarly, for integer k > 0, 

{AtcA*tc)'' =I-{I- AtcA*tc) ^{ATcA*T.y 

1=0 

k-l 

=1 - At At ^(/ - AtA^)^ 

1=0 

=I-ArA*^Y.J2(L)(-^TA*, 

1=0 m=0 ^ ^ 



\l=0 m=0 



m 



--I-At[Y,{I-AtAt)'\a*t. 



J=0 



Exploiting the Neumann series again leads to 



k-l 



lim {AtcA*tc)'' =I-At{ lim V (/ - ^^^t)' 

\ 1=0 / 

=1 - At{A*tAt)~'^A*t. 



The second claim is then followed. 



□ 



Remark 5. In real applications, the invertibility of A^At and At^A^c is a weak/minor as- 
sumption. It's noteworthy to point out that, to prove ([3]), the Neumann series can not be 
applied directly to 

lim A*rp (l + At'^A^c H h (AtcA'^c)^) . 



fc— >oo 

This is because, when \T\ < n and At^A^c is invertible, ||A7^c^^c||2 = as demonstrated in 
Lemma [3l 

Theorem 6. Suppose A*j.At is invertible with T C {1,. . . ,N} and \T\ = s. For NST+HT+FB, 
if Tj = Tj+i = . . . = T for some integer j, then x^~^^ = x^~^'^ = ... = x^. Here x^ has the 
expression 

xi = xi + \(AtAT)~^At + AUAA*)^^ (I - AT(AtAT)~^At)] Arcxi 



and 



_ + [{A*TATy^A*j, + A*T{AA*y^ {I - At(At^t)~^^t)] ^t--t= 
x*,, = A*Tc (AA*)-^ {I - At{A*tAt)-^A*t) Atcx>^c. 



Proof. Without loss of generality, we assume T = {!,..., s}. Then the second step of NST+HT+FB 
gives rise to 



Xrp 

J+l 

Xrpc 



+ 



^A*rpAT<^xipc 




j 

Xrp 


) 






X"' 





J 

Xrp^ 



+ (l - A* (AA*)-^ [At a 



{A*rpAT)~^A*pATc 
-I 



xip + {A*rpAT)~'' ApATcxip^ 




+ 



A 



{AA*)-^ {I - At{A*tAt)-^A*t) Atcx^.. 
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That is x^'^^ = x^. 

We next prove F(u^~^^ — x^'^^) =0, which imphes x- 



For simphcity, define 



= / - At{A^At)~'^A^. Evidently, A^Q = and QAt = 0. 
Since x^t^ = ^t4^^*)~^' 



j+1 



x- 



J+1 



{A*^At)-^A^Atc 
-I 



-I 



^A%,Atc 



L/rp(. 



A*rrc{AA* 



\A*rpAT)-^ A^Atc A*p, {A A*) 

* 



-A*rr,{AA*Y^<\ 



AT<:xip. 



According to At'^A^c = AA* 



AtA: 



{AipAT)-^A^AT-A*Tc{AA* 



--{A*rrATy^A*rp {AA* - AtA*^) {AA* 
--{A*rpAT)-^A*rrQ - A^{AA*Y^q 
-- - A*rp{AA*)-^<[^. 



Therefore, 



At<^xL. 



-A*rp{AA*)~\ 
-A*rp,{AA*)~H 

[I - A*{AA*y^A) A*{AA*)-^QAtcx^^ 



=0. 



The claim is then followed. 



□ 



Remark 7. Theorem [B] suggests that Tj^i = Tj is a stopping criteria for NST+HT+FB. This 
theorem also explains partially why NST+HT+FB converges in finite steps. 

The next theorem touches on the convergence of the NST+HT algorithm, and establishes a 
convergence relationship between that of NST+HT+FB and that of NST+HT. 

To avoid confusion in the next theorem, we denote by {T^} the sequences of the index set 
corresponding to the s most significant entries produced by the NST+HT+FB algorithm, and 
by {xk} the corresponding solution. 

Theorem 8. Let A be a Parseval frame. Suppose A'^At and At<:A'^c are invertible with T C 
{1, . . . , A^} and \T\ = s. Assume Ti = Tj+i = • • • = T for NST+HT for some integer i and 



Tj = Tj^i for NST+HT+FB for some integer j. If x^ 



then lim 



fc— ^-oo 



Proof. Since = x-', = Tj = T. For NST+HT+FB, when A is a Parseval frame, one can 
easily derive from Theorem [6] that 



4+1 = 4 + {A*tAt)-^A*tAtcx: 



^3 



and 



A^c {I - At{A^At)~'^A*t) At 



cx: 



which is exactly the limit of the sequence of x'^ in NST+HT, as demonstrated in Theorem HI □ 

Evidently, Theorem [8] establishes the connections between NST+HT and NST+HT+FB. For 
Parseval frames A, the output of NST+HT+FB and the the limit of NST+HT are the same in 
this very sense. 

In [? ] and various other articles, Candes and Tao introduced the notion of restricted isometry 
property (RIP) to analyze the performance of the solution to (-Pi). In the study of NST+HT and 
NST+HT+FB, it is natural to introduce a notion of preconditioned restricted isometry property 
(P-RIP) to analyze the performance of NST+HT and NST+HT+FB. To state our main results, 
we first recall the definition of restricted isometry constants. 
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Definition 9. /? / For each integers s = 1, 2, . . ., the restricted isometry constant 5s of a matrix 
A is defined as the smallest number 5s such that 

{l-5s)\\x\\l < \\Ax\\l < {l + 5s)\\x\\l (5) 

holds for all s-sparse vectors x. 

Definition 10. For each integers s = 1,2,..., the preconditioned restricted isometry constant 
7s of a matrix A is defined as the smallest number js such that 

{l-js)\\xg<\\iAA*)"^Axg (6) 

holds for all s-sparse vectors x. 

In fact, the preconditioned restricted isometry constant 7s cliaracterizes the restricted isom- 
etry property of the preconditioned matrix {AA*)~'2 A. Since 

\\{AA*)-'2Ax\\2 < \\{AA*)-^A\\2\\x\\2 = \\x\\2, 

7s is actually the smallest number such that, for all s-sparse vectors x, 

(1 -7.)||x||2 < \\iAA*)-Ux\\l < {l + 7s)\\x\\l (7) 



1 



That indicates 7s(^) = 5s{{AA*) 2 A). Evidently, for Parseval frames, since AA* = I, jsiA) = 
5s{A). We shall term either ([6]) or ([7]) the preconditioned restricted isometry property (P-RIP). 

Remark 11. We should note that the P-RIP is not particularly a stronger (than RIP) as- 
sumption over the matrix A. In fact, an upper bound of the preconditioned restricted isometry 
constant 7s maybe expressed as a function of the RIP constant 5s and the largest singular value, 
as stated in the following proposition. 

Proposition 12. Let cTmax be the largest singular value of A. For each integers s, the P-RIP 
constant 7s < 1 — ^— 

max 

Proof. Since amax is the largest singular value of A, {AA*)~^ — — / is positive-semidefinite. 
Given any s-sparse vector x. 



\\{AA*)-^Axf2 = x*A* 

^max 



{AA*)-^ ^—I 



Ax > 0. 



Consequently, 



\\{AA*)-hAx\\l > -^\\Ax\\l > ^-^Ml 

^m.ax ^rri,ax 



The upper bound of 7s is then followed. □ 

As a result, it is not hard to see from Proposition [12] that the P-RIP is satisfied with high 
probability if A is a (Gaussian) random matrix, just as in RIP. We refer to studies of extreme 
singular values of random matrices, e.g., [? ? ], for additional references. 

Before tuning to the convergence of the algorithms, let us also discuss a preferable expression 
of the P-RIP constants. Definition 1101 actually implies / — A^{AA*)~^ At < ^sl, where \T\ < s. 
It then follows that ||/ — A^{AA*)~^At\\2 < 7s, i-e., ||Pt,t||2 < 7s- Here Pt,t' is the submatrix 
of the projection P consisting of rows indexed by T and columns indexed by T' . Similarly, it 
follows from Definition that ||/ — y4^747-||2 < 5s. 

Our main result is that, with requirements over the (preconditioned) restricted isometry 
constants of A, both procedures (NST+HT and NST-|-HT-|-FB) reduce the error in each iteration 
and are guaranteed to converge to limits with error bounds depending on the tail of the real 
solution and the noise. 
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Theorem 13. Suppose Ax + e = b where e is the measurement error or noise. Let x'*^ be the best 
s-term approximation of the real solution x. If the {^sY^ order P-RIP constant of A satisfies 
73s < 0.5, then in NST+HT satisfies 

h'' - x^\2 < /||n° - x^\2 + -^—\\e\\2, 

1 - p 

where p = 273s and e = {AA*)~^ (A(x — x'^) + e). 

Proof. Let T" be the index set corresponding to the s most significant entries of the real solution, 
and let be the index set of the best s-term approximation of x^ . Define T := r^UTfcUrfc-i. 
Obviously, \T\ < 3s. 

For NST+HT, since is the best s-term approximation of x^. 



With a slight abuse of notations, we then denote by — x^^t a vector consisting of the 

entries of (u*^~^ - x") indexed by T and zeros on T^. Since x^ = u^~^ + A*{AA*)~^{b — Au''~^), 
as demonstrated in ([2]), and Ax'^ + A{x — x") + e = 5, one has. 



<2D\{u'' - xKu''-^ + A*{AA*)-\b - Au''-^) 



<2D\ (v!' - P(u'=-^ - x«) + A*{AA*Y 2 e 

=2$H^(n^ - x«)t, [P(u^-^ - x«)t]t) + 2$n^u^ - x«, A*(^^*)"^e 
<2||tx'= - x«||2||Pt,t||2||'u'=~^ - x% + 2\\u'' - xtt||2||e||2, 
where stands for the operation of taking the real part of a variable. It then follows that 

\\u^ - x% <2\\S>T,Th\\u^~'^ - x% + 2||e||2 
<273,||n^-^-x«||2 + 2||e||2 
=p\\u^~^ -X^||2 + 2||g||2. 

Therefore, 

Wv!" - x% < p^\\u^ - x'lb + -T^—Weh- 

1 - p 

The claim is then followed. □ 

One can easily reach the following corollary characterizing the behavior of NST+HT in the 
noiseless case where x is exactly sparse. 

Corollary 14. Let x be the solution to Ax = b with only s sparsity. If the (3s)*'^ order P-RIP 
constant of A satisfies 73^ < 0.5, then the sequence of in NST+HT converges to x. 

Theorem 15. Suppose Ax + e = 6 with the measurement error e. Let x" be the best s-term 
approximation of the real solution x. If the P-RIP and RIP constants of A satisfy 62s+V2^3s < 1; 
then in NST+HT+FB satisfies 

Wv!" - x^\\2 < /ll-u" - x''||2 + -^||e||2, 

1-/9 



where p = r = and e = A{x - x») + e. 
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Proof. Similar to the proof of Theorem \T3\ we denote by the index set corresponding to 
the most s significant entries of the real solution and let xt be the vector which keeps only 
the entries indexed by T. For convenience, we define T := U U and Q := A*{AA*)-2A, 
respectively. Evidently, |r| < 2s and P = / — Q. 

According to the iteration of NST+HT+FB, for any z E being supported on T^, 



; Au'' - b, Az) = (^r,4, + At, i^nM) '^n^T^Al: " ^' ^n'^T, ^ 
[^rMn^, + Ati:xI^c -b),ZT,)= 0. 
The last step is due to the feasibility of x^. The inner product can also be written as 
(^Au'' - b, Az^ = l^Au^ - Ax^ - A{x - x^) - e, Az'j 
= (a{u^ -x^) -e,Az^ = 0. 

Therefore, {u^ - x'^,A*Az) = {e,Az). 

With a slight abuse of notations, we denote by {u^ — x^)t, a vector consisting of the entries 
of {u^ — x^) indexed by and zeros on T|. Clearly, (u^ — x^)t, is supported on T^. Therefore, 

(v!' - xKA*A{u^ - x«)t,) = (e, A{u^ - x«)t,) • 

Consequently, 

\\{u'-x^)t,\\1={u'-x\{u'-x^)t,) 

= {u" - xK (/ - A*A){u'' - x^)t,) + (e, A{u'' - x^)t,)) 

<\\u'' - x%\\I - A*TATh\\{u'' - x«)tJ|2 + WehUiu'' - X^)tJ2 

<S2s\\u'' - X^hWiu'' - X«)tJ|2 + V^ + W\2\\iu'' - X^)tJ2. 

It then follows that 

\\{u'' - X^)tJ2 < S2s\\u'' - X% + y/l + 6s\\e\\2. 

Since and are supported on and T'^ respectively, 

Wu'' - x% =||(n^ - x»)t, + in'' - x»)t\tJ|2 

<||(n*^ - x^)tJ2 + ||4\T,. - 4\tJ|2 



<52s\\u'' - X^\\2 + \/r+X||e||2 + lk^\Tjl2- 



Therefore, 



I k tti|2 / 1 II tl II ( a/I + ||~|| 
\U -^1l2<Y3^ll4\Tjb + Y3j^lH|2. 



Recall that corresponds to the most s significant entries of x , it follows that 

II fc ||2 ^ II fc ||2 

Exploiting the fact x^ = x^~^ +¥{u^~^ — x^~^), and eliminating the common terms over T^DT^, 
one has 

\\[x'~'+F{u'^-'-x'^-')h»\Tj\2 < \\[x'-'+nu'~'-x'^-')]T,\r»\\2. (8) 
Since Ax''~^ = Ax^ + e = b, for the left side of 

= \\[u''-^ + A*{AA*)-kAx''-' - Au''-')]t»\tJ2 

= ||[^fc-i + A*{AA*)--2{Ax^ + e - Au''-^)]t,\t, h 

= \\[u'^-^ + q{J-u>^-^)]T»\^^ + [A*{AA*)-h]T,\Tj2- 
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For the right side of 



[u^-^ + A*{AA*)-^^{Ax''-^ - Au^-^) 



lTfc\Ttt||2 



^ A*{AA*)-^{Ax^ - Au'^-i + e)]r,\T« lb 
.fc-i 



= \\[V{u'-^-x^)]t,\t^ + [A*{AA*)-^~e]T^^TA\^. 
Then ([8]) is equivalent to 

||[n^-i + Q(x» - u^-^)Uyr, + [A* {AA*)-hUyr,h 
<\\nu'^-^ - x^)]T,\ri + [A*{AA*)-h~e]^^^^,\\^ 

Furthermore, by the triangle inequality, 

+ Q(^« _ u^-^)]T,yrj2 <l|[ff^(^'-' - x^)]t,\^tA\2 + \\[A*{AA*)-h]T,\TA\2 

+ ||[^*(AA*)-5e]r«\Tj|2- 

Together with the decomposition 
we have 

\\x%,yr,h <ll[^" - - Q(^» - n'-')]T«\Tjl2 + II [n'^-^ + Q(xtt - n'=-i)]^,\^Jb 
<||[P(xtt - n'=-i)]^,\^J|2 + ||[P(^'=-i - x«)]^^.\^,||2 

+ ||[^*(AA*)^5e]^^^y,||2 + \\[A* {AA*)-'^e]Ti\T^\\2 
<V2 (||[P(x« - n'=-i)]T||2 + ||[A*(^yl*)-^e]T||2) 
<V2 (||Ptut,_i,tut,_J|2||x« - u''-^ + \\A*{AA*)-^\2\\~e\\2 
<^/2{^r,s\\x^ -u''-% + reh). 

Therefore, 

II k tt||2 ^ \/273s II ttii , VT+Ts 

\\u -2;"||2<:; ^||ii -x»||2H j ||e||2. 

1 - 02s 1 - 02s 



It then follows that 



|n'=-x«||i</||u0-x«||2 + -— ||e||2, 

1 - p 



where p = and r= □ 

In the previous proof, we have made use of some techniques introduced by Foucart in [? ? 
]. It is also possible to improve or refine the condition 52s + V^Tss < 1, as demonstrated in [? 
], though it is not a focus of this article. 

Let us also observe that if x is exactly sparse without measurement noise, then the NST+HT+FB 
algorithm converges in finite steps, which really places this algorithm among the very fast algo- 
rithms known up to date. 

Theorem 16. Let x he the solution to Ax = h with s sparsity. If the P-RIP and RIP constants 
of A satisfies 62s + V^^is < 1; then the sequence of in NST+HT+FB converges to x in finite 
many steps. 
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Proof. Since u consists of only s non-zero entries and the sequence of u converges to x, there 
must exist a sufficiently large integer j such that Tj is the support of x. Then, by Theorem [6l 
the immediate next step of NST+HT+FB gives rise to u^~^^ = x. The claim is followed. □ 

Remark 17. Although NST+HT+FB has a pursuit spirit seen in various algorithms such as 
HTP [? ], we should not undervalue the feedback form that we proposed here. The feedback 
mechanism plays a significant role particularly for large scale problems. 

In fact, the point of feedback enables us to derive several other more efficient algorithms, 
as presented in Section [H For large scale problems, these suboptimal algorithms based on the 
feedback idea, NST+HT+subFB and NST+stretchedHT, are seen much more realistic and much 
faster without having to compute inverses {A^^ATf.)~^ in each and every step. 

6. Effectiveness at Large Scales and Numerical examples 

In this section, we support the claim that the proposed algorithms are very effective (partic- 
ularly at large scales) in practice by extensive numerical experiments in four different aspects. 

• The first is to illustrate the overall performance of the NST based algorithms in terms of 
execution-time and comparisons with known fast algorithms. 

• The second is to investigate the frequency of exact /successful recovery of the NST based 
algorithms and comparison with known effective algorithms. 

• The third is about the performance comparison of the algorithms in the noisy cases among 
the NST based algorithms and others. 

• The fourth is to demonstrate the performance differentiations within the class of the NST 
based algorithms. 

Among many, by known algorithms we mean those that are similar to the NST based ap- 
proaches. These include IHT, OMP, HTP and SP, which are all methods attempting to address 
the sparse solutions. One BP algorithm, CVX [? ], is also included in the comparison to under- 
stand the solution capacity between the ^o-™iiii™ization and the ^i-minimization methodologies. 
The comparisons with these representative algorithms are to demonstrate the performance dif- 
ferentiations, and to provide a measurement scale and judgement foundation for the algorithms 
comparison. 

We must comment that the comparisons are far from complete. There are indeed some 
algorithms that we decide not to involve in the comparison. These include, for instance, CoSaMP 
and linearized Bregman iteration. For CoSaMP, the reason is that the main idea and the 
performance of CoSaMP and SP are very similar. For the Bregman iteration, the reasons 
are twofold. One is that Bregman iterations are a means to find an approximate solution to 
(Pi), while our algorithms attempt to solve an £o-™iiii™ization problem. Solutions to (Pi) is 
represented by the CVX in our studies (mostly for the existence of the solution, not for the 
speed of the algorithm). The other is due to the fact that the optimal performances of various 
Bregman iterations require somewhat skillful selections of the parameters. It is not easy to 
ensure a fair comparison. However, from the approximate spirit and the algorithm formulation 
of the Bregman iterations, we suspect that the NST based algorithms outperform the linearized 
Bregman iteration substantially. Our many trials of a known linearized Bregman algorithm with 
kicking confirm such believes. We did not include these trials in the article because we were not 
entirely sure if the parameters of the Bregman iterations have been selected optimally. 

Except for the specific large scale tests, most tests uses a 128 x 256 matrix A with stan- 
dard i.i.d. Gaussian entries. All the columns are normalized to unit ^2-iiorm. The support of 
the sparse signal is chosen randomly. We refer Gaussian sparse signals to those whose nonzero 
entries are drawn independently from the Gaussian distribution with zero mean and unit vari- 
ance. For Bernoulli sparse signals, the nonzero entries are drawn independently from ±1 with 
equiprobability. 



17 



0.02 
0.018 

0.016 

0.014 

<D 0.012 
E 

"5) 0.01 
c 0.008 
0.006 
0.004 
0.002 



-HTP 
-OMP 
-SP 

- NST+HT 

- NST+HT+FB 
NST+HT+subFB 

- NST+stretchedHT 




0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.4 

s/ n 

(a) Small scale problems, n = 128 and N = 256. 
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(b) Large scale problems, n = 5000 and A'^ = 10000. 



Figure 3: Plots of the execution-time as a function of s/n. (a) For small scale problems, NST+HT+FB is the 
most efficient, (b) For large scale problems, the NST based algorithms (not including NST+HT+FB) are more 
than 5 times faster than the other algorithms. 



6.1. Overall execution-time comparison 

For execution-time comparison, we tested both the small scale (128 x 256) and the large 
scale (5000 x 10000) problems. For each sparsity value s, the small scale is tested for 5000 
trials, while the large scale is tested for 20 trials. The execution-time is recorded for every trial. 
The average time is then calculated and plotted in Figure [3l Note, we have set ei = 10"^ and 
62 = as stopping parameters for all the NST based algorithms. The parameter A in the 
NST+HT-|-subFB algorithm has been set to A = 1 in all experiments. 



For small scale problems. Figure 3(a) indicates that NST+HT+FB is the most efficient 
among all the methods compared. The HTP [? ] stands at the second position. The other three 
NST based algorithms spend a little more time than SP. The execution-time of OMP grows 
dramatically as the sparsity (number of nonzeros) increases. 

For large scale problems, however. Figure |3(b)| shows that the NST based algorithms NST+HT, 
NST-|-HT-|-subFB, and NST-|-stretchedHT are much more efficient (more than 5 times faster) 
than all competitive algorithms. The efficiency is particularly evident as the sparsity level in- 
creases. The execution-time of the three NST based algorithms grows much slower than those 
of other algorithms as the sparsity value increases. 

6.2. Overall successful recovery performance and comparison 

The second test is to compare the frequency of exact recovery of the NST based algorithms 
with those of other algorithms. For the parameters of the adaptive NST based algorithms, we 
refer to the last test set. For Gaussian sparse vectors, we set the initial sparsity value sq = 0.5s for 
the adaptive NST+stretchedHT algorithm. For the other three adaptive NST based algorithms, 
we set So = 0.3s. For Bernoulli sparse vectors, we set sq = 0.9s for all the adaptive algorithms. 
To obtain better performance, we set s' = 1 for all scenarios. Each algorithm is tested for 500 
trials for every value of s. An exact recovery is then recorded whenever — 2;||2/||x||2 < 10~^. 
The frequency of the exact recovery as a function of the sparsity measurement ratio s/n is 
plotted. 



The results for Gaussian sparse signals are demonstrated in Figures 4(a) It is evident that 
the adaptive NST based algorithms outperform all other algorithms by a great margin. The 
performance of the nonadaptive NST based algorithms is similar to that of HTP. For the clarity 
of the plots, the results of the other three (adaptive) NST based are not plotted here. Their 
performances are illustrated in Figure [9] in the last numerical experiment. 

Unlike the situation of recovering Gaussian sparse vectors, for Bernoulli sparse vectors, BP- 
CVX performs the best, as demonstrated in Figure [4(b) [ Such an advantage of BP for recovering 
Bernoulli sparse signals is also observed in [? ? ]. 
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Figure 4: (a) Plots of the frequency of exact recovery of Gaussian sparse vectors as a function of s/n. Exact 
recovery means ||it — 2;||2/||a::||2 < 10~^. The adaptive NST based algorithms outperform other algorithms greatly, 
(b) Plots of the frequency of exact recovery of Bernoulli sparse vectors as a function of s/n. BP-CVX possesses 
the best performance and the adaptive NST based algorithms become competitive. 



The adaptive NST+HT algorithm is the next best algorithm, still outperforms OMP, HTP 
and SP significantly. It is also clear that the nonadaptive NST based algorithms outperform 
OMP and HTP by a good margin as well. Evidently, all the algorithms outperform IHT greatly 
in both scenarios, 

6.3. Comparison in contaminated signals and noisy measurements 

This study is to investigate the performance of the NST based and other algorithms in 
the noisy cases, which consists of two numerical experiments. The first is to deal with the 
contaminated signals case. To guarantee fixed signal-to- noise ratio (SNR), the ideal sparse signal 
x", which is contaminated with zero-mean white Gaussian noise v, is normalized to unit ^2-norm. 
We rescale the noise level e := ||u||2 to the specified values and then obtain b = A{x^ + v). The 
second experiment is to deal with the contaminated measurement case. The ideal measurement 
Ax'^ is normalized to unit ^2-iiorm and contaminated with zero- mean white Gaussian noise v, 
whose level e := ||f II2 is rescaled to the specified values. The measurement is then b = Ax^ + v. 
In all the three experiments, 5000 trials are performed with the noise level e = 0, 0.01, . . . , 0.2. 
To guarantee the best performance of OMP [? ], only s iterations are carried out for every trial. 

The results are demonstrated in Figure [SI which implies that all algorithms possess better 
stabilities. It's clearly that OMP outperforms the other algorithms in both cases, but the dif- 
ferentiation is not dramatic when the signals are contaminated. In the NST group, for both 
contaminated signals and noisy measurements, NST+HT+FB outperforms the other three al- 
gorithms slightly. 

6.4- Performance comparison within the class of the NST based algorithms 

We test the overall performance of the NST based algorithms in three numerical experiments. 
The first is to demonstrate the convergence. With fixed value of s, for each of 5000 trials, we 
record the relative error \\u'' — a;||2/||a^||2 for every k. 

Figure [6(a) shows the fact that the NST+HT+FB algorithm converges in finite many steps. 



It demonstrates the first 10 instances of the 5000 trials for recovering Gaussian sparse vectors. 
Figure [6(b)| plots the average errors of all the NST based algorithms as functions of k. 

The second experiment is to illustrate the number of iterations for convergence. For each 
value of s, we first recorded the number for all of 5000 trials. The average number is then 
calculated. Figure [7] plots the average number as a function of the sparsity measurement ratio 
s/n. 
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(a) Signal is contaminated: A(x^ + v) — b 



(b) Measurement is contaminated: Ax^ + v — b 



Figure 5: Plots of |ju — .'c''||2/||a;''||2 as a function of the noise level e for all the NST based algorithms and other 
algorithms. Here s — 20. OMP outperforms the other algorithms in both cases, but the differentiation is not 
dramatic when the signals are contaminated. 




Figure 6: Plots of the error \\u'' — 2;||2/||a;||2 for recovering Gaussian sparse vectors as a function of k for all the 
NST based algorithms. Here s = 30. It takes NST+HT+FB no more than 10 steps for exact recovery. The initial 
relative errors are all set as Hx" — a;|t2/||a;||2. 
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Figure 8: Investigation of the initial sparsity values: plots of the frequency of exact recovery of sparse vectors as 
a function of s/n for the adaptive NST+HT algorithm, (a) Recover Gaussian sparse vectors. The value so = 0.3s 
outperforms the other choices. Smaller than 0.3s value of sq would not make much difference, (b) Recover 
Bernoulli sparse vectors. The value so = 0.9s outperforms the other choices. 



Adaptive NST algorithms 


Gaussian sparse signals 


Bernoulli sparse signals 


Adaptive NST+HT 


0.2s < So < 0.4s 


Sq « 0.9s 


Adaptive NST+HT+FB 


0.2s < So < 0.4s 


Sq S» 0.9s 


Adaptive NST+HT+subFB 


0.2s < So < 0.4s 


Sq 0.9s 


Adaptive NST+stretchedHT 


0.4s < So < 0.5s 


So 0.9s 



Table 1: The near optimal values of so for various adaptive NST based algorithms. 



The obvious conclusion from the first two experiments is that NST+HT+FB converges 
dramatically rapidly than the other three NST based algorithms for small scale problems. We are 
surprised that, in this setting (s = 30, s/n 0.23), it always takes NST+HT+FB no more than 
10 steps to recover sparse vectors exactly. In addition, NST+HT+subFB and NST+stretchedHT 
all possess better convergent behaviors than NST+HT. The first two experiments verify the roles 
of the feedback and the stretch, evidently. 

The third experiment is to examine the performance of the adaptive NST based algorithms 
with different initial sparsity values so in the noiseless case. For simplicity, we just set sq = us and 
s' = 1. Take adaptive NST+HT as examples. With fixed s and < k < 1, we perform adaptive 
NST+HT 500 trials. An exact recovery is then recorded whenever ||n — x||2/||x||2 < 10~^. 
Finally, we plot the frequency of exact recovery as a function of s/n for different values of n. 

Figure [8] demonstrates the frequency of exact recovery as a function of s/n for different k. 
We actually recorded the results for k = 0.1,0.2, . . . ,0.9. However, for better contrast of the 
plots, the representative curves within the transition range are only presented. 

The observation is that the recoverability of the adaptive NST based algorithms relies quite 
heavily on the values of sq- Moreover, an optimal initial sparsity value sq for Gaussian sparse 



signals and Bernoulli sparse signals are different. Specifically, as indicated in Figure 8(a) , Gaus- 
sian sparse signals require smaller initial so for better performance. On the other hand, for 
Bernoulli sparse signals, larger so gives rise to better performance. The other three adaptive 
NST based algorithms all possess the similar principle. The empirical optimal values of so for 
different NST based algorithms are summarized in Table [H 

Figures [9] demonstrates the comparison of the recoverability of the (adaptive) NST group 
for recovering Gaussian sparse vectors. Figure 9(a) compares the nonadaptive NST based al- 
gorithms, while Figure |9(b)| shows the comparison among the adaptive NST based algorithms. 
We comment that horizontal axis is restricted in the transition range. Consequently, even 
though there are some differentiations among the different algorithms, their performances are 
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(a) Nonadaptive NST based algorithms (b) Adaptive NST based algorithms 



Figure 9: Plots of the frequency of exact recovery of Gaussian sparse vectors as a function of sjn for (adaptive) 
NST based algorithms. Within the nonadaptive group, NST+HT and NST+stretchedHT outperform the other 
two nonadaptive algorithms slightly. Within the adaptive group, adaptive NST+HT is the best choice. 

very similar. Otherwise, if we plot the overall horizontal axis, the different curves would be 
non-distinguishable. Similar results hold for Bernoulli sparse vectors. We decide not to show 
the very similar plots within the (adaptive) NST based algorithms. 
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